Parsing TCT with Split Conjunction Categories
نویسندگان
چکیده
We demonstrate that an unlexicalized PCFG with refined conjunction categories can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence assumptions latent in a vanilla treebank grammar and reflect the Chinese idiosyncratic grammatical property. Indeed, its performance is the best result in the 3nd Chinese Parsing Evaluation of single model. This result has showed that refine the function words to represent Chinese subcat frame is a good method. An unlexicalized PCFG is much more compact, easier to replicate, and easier to interpret than more complex lexical models, and the parsing algorithms are simpler, more widely understood, of lower asymptotic complexity, and easier to optimize.
منابع مشابه
The Parsing Algorithm of Translation Corresponding Tree (TCT) Grammar
In machine translation (MT), parsing acts as a kernel step to analyze and acquire the syntactic information of an input sentence for the purpose to reproduce the corresponding translation in target language according to the syntactic relationships between the source and target sentences. The parsing process is guided by a set of language formalism, and the design of such algorithm is highly dep...
متن کاملDiscriminative Parse Reranking for Chinese with Homogeneous and Heterogeneous Annotations
Discriminative parse reranking has been shown to be an effective technique to improve the generative parsing models. In this paper, we present a series of experiments on parsing the Tsinghua Chinese Treebank with hierarchically split-merge grammars and reranked with a perceptronbased discriminative model. In addition to the homogeneous annotation on TCT, we also incorporate the PCTB-based parsi...
متن کاملA Simplified Chinese Parser with Factored Model
This paper presents our work for participation in the 2012 CIPS-ParsEval shared task of Simplified Chinese parsing. We adopt a factored model to parse the Simplified Chinese. The factored model is one kind of combined structure between PCFG structure and dependency structure. It mainly uses an extremely effective A* parsing algorithm which enables to get a more optimal solution. Throughout this...
متن کاملSparse Multi-Scale Grammars for Discriminative Latent Variable Parsing
We present a discriminative, latent variable approach to syntactic parsing in which rules exist at multiple scales of refinement. The model is formally a latent variable CRF grammar over trees, learned by iteratively splitting grammar productions (not categories). Different regions of the grammar are refined to different degrees, yielding grammars which are three orders of magnitude smaller tha...
متن کاملCRF tagging for head recognition based on Stanford parser
Chinese parsing has received more and more attention, and in this paper, we use toolkit to perform parsing on the data of Tsinghua Chinese Treebank (TCT) used in CIPS, and we use Conditional Random Fields (CRFs) to train specific model for the head recognition. At last, we compare different results on different POS results.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012